Executive Summary

In this report we will analyze the overall progress of the LEGO company. Specifically, we will delve into marketing strategy differences over the years, the technology advancement of production line and the creativity connected with creating and expanding the themes of LEGO sets. We are going to perform a very in-depth analysis that will provide enough information to foresee the future which LEGO company will have ahead of itself.
Our first few experiments prove that LEGO has drastically changed in terms of marketing. Data shows that LEGO is starting to slowly pull more adult audience towards itself and most likely this trend is going to continue in the future, as such strategy is far more profitable for the company. As this strategy works very well, the LEGO bricks have been in demand. That infers that the technological advancement of production of bricks is also very much sufficient, as far more blocks are produced in a large variety of colors making it enough to cover any shortcomings. This advancement has been so high in the last few years that we expect it to slowly die out, eventually becoming constant. Additionally, the representation of hierarchy of themes of LEGO sets allowed us to notice some interesting dependencies. Above all, we noticed that modern themes mostly do not have subthemes yet, that is because LEGO prefers to be safe and firstly create multiple discrete themes and then delve deeper into only a few that have met public expectations. This rule is likely not going to change, as it has always worked like that.
We recommend performing more specified research about the exact themes that has been expanded upon, as this information may prove to be useful to create successful future sets. On the other hand, we also recommend analyzing themes that have not succeed, to avoid similar mistakes.

Introduction

This report is produced as an assignment for the Data Visualization course. We will analyze the overall progress of the LEGO company. Specifically, we will delve into marketing strategy differences over the years, the technology advancement of production line and the creativity connected with creating and expanding the themes of LEGO sets. It is well known that, LEGO has grown since the beginning, but we will perform an in-depth analysis of topics mentioned above, measuring exactly what the growth is. Additionally, we will try to draw conclusions regarding decisions of head executives of the company based on our observations. What is more, we are going to try predicting the future growth and the direction in which this company may go. Furthermore, we are going to present possible further areas of research.

Methodology

To perform following experiments we have used a rebrickable data set available here. The specific schema diagram for LEGO data files is represented below (picture from the rebrickable website).

Additionally, to perform all necessary calculations and graph creations we have used R and the following libraries:

  • ggplot2 - used to create multiple graphs
  • dplyr - used to perform data transformations
  • knitr - used to create the report files
  • ggraph - used to create Graph 6
  • igraph - used to create Graph 6
  • plotly - used to add interactivity to multiple graphs
  • circlepackeR - used to create the Bonus Graph Additional Information click here
  • data.tree - used to perform data transformations for the Bonus Graph
  • tidyr - used to perform data transformations

Data files have to be placed in a separate directory called Data, that should be created in the same directory as the source file.

Keep in mind that this report was created mainly to be represented as an html version, therefore pdf output may not work.

  • The package circlepackeR cannot be downloaded easily therefore additional package is used - devtools and additional command can be found in the setup block: devtools::install_github(“jeromefroe/circlepackeR”) as the package can be downloaded from github. More information regarding the package can be found here and here

Results & findings

Graph 1

This graph represents the quantities of sets according to the amount of parts included in them. Each bar on this plot represents the interval from \(100 * n\) to \(100 * (n + 1)\). The plot has been limited to only show sets with the number of parts below 1500, as the sets above this number are sparse and do not provide any useful information.
This visualization shows that most sets that are produced do not exceed 200 blocks. What is more, we can notice that we can more or less divide the sizes of the sets into 3 different groups, with accordance to the production frequency. First one - most popular up to 200 blocks, second - less popular up to 1000 bricks and the least popular group - above 1000. The relations between these groups will be tested in the further experiments.

Graph 2

This graph represents the amount of different sets produced each year and their classification into three previously mentioned groups. Each bar on this plot represents a single year of LEGO existence. The bar is divided by different colors which can be interpreted according to the legend.
This visualization shows that the number of sets was increasing every year, reaching the peak value in 2021. Additionally, it once again shows that the sets with more than 1000 blocks are the rarest, while the sets with less than 200 bricks are the most common. This graph, however, does not allow us to easily examine the exact proportions of these three groups in given years. This knowledge is absolutely fundamental to understand the main target market of the LEGO company, the next experiment will shed some light on this topic.

Graph 3

This graph represents the proportion of the three groups of set sizes throughout the years. Each bar on this plot represents a single year of LEGO existence and is divided by different colors which can be interpreted according to the legend. There are two years, where no information of available sets is provided.
This plot shows that the number of small sets (\(<200\)) was the most popular every year, but some interesting conclusions can be derived. First of all, LEGO decided to produce medium sets (\(200<x<1000\)) back in 1960s and from that point onward the proportion of these sets was mostly growing. What is more, LEGO started production of sets bigger than 1000 elements in about 1990s and the proportion of these sets is also growing ever since. That means that LEGO is taking the opportunity to grow the possible consumers age range by producing sets targeted towards older audiences, while still maintaining the production of sets specifically created for children and teenagers.

Graph 4

This graph represents the amount of various bricks used in sets produced in given years and the distribution of their colors. Each bar on this plot represents a single year of LEGO existence and is divided by different colors which represent the exact color of the bricks produced. Transparency of blocks and other special features were omitted.
After analyzing the marketing strategy of LEGO, it is also important to analyze technological growth of the production line of the bricks themselves. This plot shows that the variety of blocks has been growing almost at all times, and the different colors used now cover nearly all hues that can be distinguished by human eye, which was not the case in the beginning. It proves that the company developed sufficient technology to produce large variety of blocks and their colors in a large scale making it possible to reach larger audiences.

Graph 5

This graph represents the amount of themes used per year and whether the themes have a parent or not. Each bar on this plot represents a single year of LEGO existence and is divided into two subcategories, that can be interpreted according to the legend. There is also a zoom on the early days of LEGO to be able to see more clearly how the themes were distributed back then.
Seeing the advancement of technology as well as the marketing, we can take a look at how the amount of themes varied along these years. This graph shows that the diversity of themes has been growing throughout the whole lifetime of LEGO. Additionally, we can notice that there are more and more standalone themes produced, while still maintaining the expansion of previously released themes. Interestingly, the themes produced at the very beginning had parent themes, what may infer a couple of possibilities. Either the data was not correctly classified, the data does not contain all the information or most possibly, themes, in which sets were produced in the early days, were later classified to be a subclass of a newer theme. We are going to explore the theme hierarchy in the next experiment to draw more accurate and in-depth conclusions.

Graph 6

This graph visualizes the expansions of themes and the years of the releases of the first sets of given themes. Each circle represents a theme and may contain other circles inside, which represent the subthemes. What is more, color of the rim of each circle represents the year that a first set of given theme was created. The year value can be read from the legend on the side. Sadly names are not included as the graph would be unreadable.
As mentioned in the previous test, there were some unclear information presented on the graph. This representation shows us that the most probable theory is indeed correct. We can notice that some older themes are categorized as subthemes of newer ones. Additionally, we notice that majority of themes were not expanded upon, that suggests that consumers of LEGO products get bored quickly and prefer to get a large variety of themes than to delve deeper into a single theme. There are a few exceptions that have multiple levels of hierarchy, suggesting that some specific topics are liked by the public and require LEGO to proceed creating even more specific subsets of these themes. On top of that we can notice that usually older themes are expanded, while most of the newer sets are either subthemes of older themes, or completely new themes that have yet to be expanded. Interestingly, there is a single group that has been already expanded at the early days of LEGO and did not receive any more expansions. To see specific details of any interesting cases, we have prepared an additional representation that is interactive, meaning the graph even with names is far more readable.

Bonus Graph

This graph represents the same information that is represented in Graph 6 with addition of specific names of themes so that research can be conducted according to specified themes. To see deeper levels of hierarchy, just click on the specific subcategory. To leave the zoom click outside of the circle. Some information regarding the themes that have not been expanded yet is still hard to read but if clicked correctly, one may see the title of these themes.

Discussion

Our experiment accuracy and significance is solely dependent on the data set provided - our data set possibly had some missing information, but still most significant data was present. We are assured of the great accuracy by the facts that can be derived from the graphs which are closely related with the current and previously noticed trends. Our first few experiments try to evaluate, what one of those trends may be, and they do prove that LEGO has drastically changed in terms of marketing. Data that has been gathered and represented by graphs shows that LEGO has changed their strategy and is starting to slowly pull more adult audience towards itself and most likely this trend is going to continue in the future, as such strategy is far more profitable for the company. Additionally, we can also notice that far more sets are being sold now, thus the marketing strategy clearly is working. This fact depicts the technological advancement of production of bricks, as the demand is growing, while LEGO is still able not only to produce enough of bricks, but also come up with new shapes and colors almost every year. However, we believe that this growth has nearly stopped and will become almost constant in the near future. That is due to the fact of how quickly the number of different bricks grew in the past few years, it is ought to stop, as the human creativity has its limits. What is more, in the year 2022 we may already notice the a slight drop of number of blocks. Furthermore, creativity can be also tested by depicting the hierarchy of themes. That representation allowed us to notice some interesting dependencies. For example LEGO does not forget about the older themes, they tend to either mark the old ones as subthemes for newer themes, though that is rare, or they extend the old ones by new ideas, which is far more popular. We may also notice that modern themes mostly do not have subthemes yet, that is because LEGO prefers to be safe and firstly create multiple discrete themes and then delve deeper into only a few that have met public expectations. That has been their strategy from the very beginning and it seems to work perfectly, thus we expect that not much is going to change in this matter. There are just a few exceptions to this rule, but these exceptions form a very small part of the whole theme space.

Conclusion & recommendations

The LEGO company seems to have picked all the right decisions throughout history and continues to do so. The current marketing strategies are very well planned, and the progression of bricks production is closely related to this this plan, thus no problems of unavailability or overproduction will be encountered. The company is ought to grow more, but the expansion is going to take slower and slower until stopping fully. We suggest performing more specified research about given themes that can be described as “liked” by the public, judging by the number of subcategories of given themes, and why some old sets have failed to expand. This information may be crucial to develop future themes that would be made to succeed.